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ABSTRACT 

One formulation of confidence scoring requires the 
examinee to indicate as a number his personal probability of the 
correctness of each alternative in a multiple-choice test. For this 
formulation a linear transformation of the logarithm of the correct 
response is maximized if the examinee accurately reports his personal 
probability. To equate omits scores with choice scores, the 
transformation can be chosen so that the score is zero if the 
examinee indicates complete uncertainty. If this is done, the scoring 
function depends on the number of alternatives. One could also align 
uncertainty and response omission by granting credit for omicting 
items, though it is felt this might be hard to explain. (Author) 
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SUMMARY 



Problem 

One formulation of confidence scoring requires the examinee to indicate as a number his personal 
probabflity of the correctness of each alternative in a multiple-choice test. For this formulation it has been 
shown that a linear transformation of the logarithm of the correct alternative i:> a scoring function which 
maximizes the expected score of the examinee if he accurately reports his personal probabilities. The 
present paper calculates the expected score corresponding to a chance level of personal probability, thus 
allowing the equating of lack of response with complete uncertainty. 

Approach 

The solution required can be reached merely by imposing an appropriate boundary condition on the 
solution to a differential equation. The condition is that when the examinee indicates a certainty equal to 
the reciprocal of the number of alternatives, the implied score should be zero or the implied score should be 
awarded when a response is omitted. 

Result 

If one grants score points for omitted items, one may equate omission scores to chance score. 
However, one must explain so the examinees understand that credit may be given for omitted items. 
Alternatively, one may modify the scoring formula by subtracting a constant to produce a zero when a 
chance level of personal probabUity is indicated. In this case, the scoring formula is a function of the 
number of alternatives. However, it is possible to ascribe uncertainity to the answer when that answer is not 
the preferred answer; i.e.. .he method does not rigorously imply uncertainty. 

Conclusion 

No single scoring system will handle all numbers of alternatives if omission and the complete 
uncertainty are to be aligned. If certainty at the level of the reciprocal of the number of alternatives is 
rigourously to indicate complete uncertainty, the response format discussed here is not suitable.. 
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PREFACE 



The research reported in this memorandum is the result of work performed by the 
Educational Testing Service, Princeton, NJ. under the provisions of Contract F 
41609-70-C-0044. Project Monitor was Capt Wayne S, Sellman. The research was 
conducted under Project 1121, Technical Training Development; Task 1 12103, 
Evaluating Individual Proficiency and Technical Training Programs. Dr. Marty R. 
Rockway was the Project Scientist and Capt Wayne S, Sellman was the Task Scientist. 



AN APPROXIMATELY REPRODUCING SCORING SCHEME THAT 
ALIGNS RANDOM RESPONSE AND OMISSION 



The related problems of guessing and partial knowledge have stimulated quite a lot of consideration 
by test-oricnted persons who are dissatisfied with the limited amount of information conveyed by the 
responses to multiple-choice items. One way to increase this information without increasing the amount of 
substantive interpretation required is to allow the examinee to indicate for each alternative the amount of 
uncertainty, or probability, of correctness of each alternative. In so doing, one may make the testing 
process more palatable in that the examinee is allowed to communicate his unsureness and hence reduce the 
presumed feelings of risk and anxiety associated with marking the **best" answer-he may have very mixed 
feelings about the "bestness" of that answer. 

It should be mentioned that while there is much interest in improving testing procedures, and 
confidence testing is strongly suggested by some (Shuford and MassengilL 1967), confidence testing should 
not be embraced uncritically as an improvement. Some have reservations which stem from the fact that 
confidence testing requires the examinee to decide whether to take a risk and how much risk to take when 
making each response, as will be seen. With the usual multiple-choice testing this decision about possible 
risk may be less apparent to the examinee, and, hence, the personality factors operative in the two types of 
testing may not be the same. Swineford (1938 and 1941) has presented evidence of a relation between 
personality factors and risk taking in confidence testing quite apart from achievements involved. Therefore, 
one should take care to ascertain that the changed operations of personality factors introduced through 
confidence testing do not defeat the purpose of measurement. 

The prrsent paper is not responsive to the problem of personality factors but to the treatment of 
omitted responses. That is, it remains usual to coordinate omissions scorings with the rest of the scoring 
procedures and that is the function of this paper, at least for the confidence-testing format discussed below.. 
This format is one in which me examinee indicates his certainty of the correctness of each alternative as a 
non-negati/e number, and the certainties recorded must sum to specified total, such as unity in the case 
where they are described as being probabflities of correctness. De Finetti(1962) has raised the question as 
to whether >^en this is done, the examinees will give a response directly indicative of their personal 
probabilities of the correctness of the responses and has introduced some scoring functions that are 
maximized when the responses equal those personal probabilities (1965)-the notioi being that a rational 
man will respond honestly when such behavior optimizes his expected score. Shuford, Albert, and 
Massengill (1966) introduced a formalization of this notion, called the reproducing scoring property, and 
have pointed out that vAien one scores only the correct response, the storing function which is reproducing 
is unique and is of the form 

S = AlogBx, (1) 
where S is the item score and x is the response to the correct altejaiative.^ They have taken B as ten and A 



The development of formula (1) can be carried out ai follows. Let Sh(r|i) be the score assigned if alternative h is 
correct and the examinee has indicated an amount of certainty equal to r^- Then if pj, is his subjective probability that an 
alternative h is correct, his expected score over all alternatives is 

E = ?PhSh (nJ » 
h 

and it is desired to have E at a maximum when r|| = pj, subject to the constraint that 

Thus the objective function 

E = 2phSh (rh) + A(l - Srh) . 
h h 

where A is the Lagrange multiplier imposing the condition that the r's sum to one, is maximized when 
^ ^h(ni) 

drh rh = Ph 

or 

dSh (Ph) = A^ 
dph Ph 

Therefore E is at a maximum when the scoring function S, is 
S = A log Bx 

where X is the indicated certainty for the correct answer, and B is a constant of integration. The proof b ancillary to the 
text of the paper but is included as tt is quite a bit simpler than that given by Shuford et al (1966). 
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as unity when the loganthm is to the base ton and introduced llie arbitrary score of minus one when x is in 
an interval below one hundredth (so that the scoring function will be lioundcil). Tims 

S| = I +logx .OK X > I (>) 

in their formulation. 

These choices may be overly arbitrary, however, in that no provision is made for the situation where 
the examinee omits the item.. For example, his score on an item about whose answer he hasn't the foggiest 
notion should be the same whether he responds to it telling that he knows nothing about it, or whether he 
omits It. He should also not expect to receive more credit for marking at random at the end of a test than 
the examinee who does not. To correct for omissions one mi^t use formula (2) and assign a non-zero value 
to the omitted items. For example, in a four choice test the value of 

S, = 1 + log .25 

or about .04 is tlie score to be assigned to each omitted item. For two, three, and five choice items the 
scores assigned to omits would be about .7, .5, and 3, respectively. If these corrections for guessing are 
used, they may however, still prove unsatisfactory in that the examinee may have some difficulty 
understanding why points should be given for omits and might adapt some truly pathological strategy out 
ot misunderstanding unless he thinks that omits wiU be physically ignored in the scoring process. 

When using traditional formula scoring, one sets up the formula so that the average score under 
random guessmg is zero, and it is suggested here that such could also be done in the confidence testinc 
situation by appropriate choice of A and B in the scoring function. This is done by setting B equal to the 
number of alternatives. Then when the examinee mark? that his uncertainty is 1/k where k is the number of 
alternatives, as he would if he is jidicating no information, the score would be the same as if he omitted it 
in that either way the score is zero. Thus the formula 

S^=A(logK + ljgx) (3) 
takes on a zero when uncertainty is expressed and does so no matter which alternative the examinee marks 
Table 1 IS provided with entries aligned with a zero assignment to omits, and the value used for the constant 

t\ IS 

A=l/Oogk) 

which sets the upper bound of the score at unity. At the lower range of the table where x approaches zero 
the value of the scormg function when x is .0 is used to keep the function bounded. 

A- V"^ I'if "'ent provided by adjusting the score for omits as suggested either way does not aUow one 
to dminguish the situation where a non chance level of uncertainty is assigned to some other ZZZe 
from one where ^ the responses are at the chance level. To handle this situation usingonly one rerpcns^ 
per Item one might score only the highest certainty rewarding the response differently when it is rieht than 
when It IS wrong (Boldt, 1971). When this is done, a chance response would indicarcomp e te ml 
since the certainties must sum '. o one. compieic certainty 
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